perm filename LISP1.OLD[LSP,JRA]21 blob
sn#225081 filedate 1976-07-14 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00019 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 .Sec(Symbolic expressions,Symbolic expressions)
C00019 00003 .SS(Symbolic Expressions: abstract data structures)
C00026 00004 .<<Examples: s-exprs not sexprs>>
C00031 00005 .SS(Trees: representations of Symbolic expressions)
C00035 00006 .GROUP
C00039 00007 .REQUIRE "PROB1.PUB" SOURCE_FILE
C00040 00008 .SS(Primitive Functions)
C00053 00009 .<<car and cdr>>
C00060 00010 .<<discussion of components of function def>>
C00072 00011 .SS(Predicates and Conditional Expressions,conditional expression)
C00081 00012 %3
C00093 00013 We cannot restrict all our functions to be strict if we expect to do any
C00103 00014 Here's an intuitive description of such a predicate named %3equal%*.
C00109 00015 .REQUIRE "PROB2.PUB" SOURCE_FILE
C00116 00016 .SS(Sequences: abstract data structures,,P100:)
C00136 00017 .SS(Lists: representations of sequences,lists)
C00152 00018 .BEGIN TURN ON "←"
C00165 00019 .SS(A respite)
C00184 ENDMK
C⊗;
.Sec(Symbolic expressions,Symbolic expressions)
.SS(Introduction)
This book is a study of data structures and programming languages;
in particular it is a study of data structures and programming
languages centered around the language LISP. However, this is not
a manual to help you become a proficient LISP coder.
We will study many of the formal and theoretical aspects of languages and
data structures as well as examining the practical applications of
data structures. We will show that this area of computer science
is a discipline of importance and beauty, worthy of careful study.
How are we to proceed? How do we introduce rigor into a field
whose countenance is as %3ad hoc%* and diverse as that of programming?
We must also bear in mind that the results of our studies are to have
practical applications.
We must not pursue theory and rigor without proper
regard for practice. Our study is not that of pure mathematics; our
results will have applications in everyday programming practice.
However, for guidance let's look at mathematics. Here is a well-established discipline
rich in history and full of results of both practical and theoretical
importance. Will our comparison of mathematics and programming languages
bear fruit or will it simply show our impudence and naivete? We shall see.
One of the more fertile, yet easily introduced areas of mathematics, is that of
elementary number theory. It is easy to introduce because
everyone knows something about the natural numbers.
Number theory studies properties of a certain class of operations definable
over the set %dN%* of non-negative integers also called natural numbers.
A very formal presentation might begin with a construction of %dN%*
from more primitive notions, but it is usually assumed that the reader
is familiar with the fundamental properties of %dN%*.
In either case the next step would be to define the
class of operations which we would allow on our domain.
We shall begin our study of LISP in a similar manner,
as an investigation of a certain
class of operations definable over a domain of objects, called Symbolic Expressions.
Though most people
know something about the natural numbers, we are perhaps not so fortunate when it
comes to Symbolic expressions.
We must define what we mean by "symbolic expression".
Let's look to mathematics again
for help. If we asked someone to define the domain %dN%*, the definition
we would receive
would depend on how familar that individual was with the properties of the
natural numbers.
For most people and purposes, the following characterization of a natural number
is satisfactory:
.BEGIN TABIT1(10);
%aI:%* \A natural number is a sequence of decimal digits.
.END
However this description is mathematically quite superficial and is completely
inadequate for the purposes of discussing properties of %dN%1.
Clearly all of the
information we know about the relationships between natural numbers is lacking
in this description. The "meaning" of the natural numbers is missing. It is
like giving a person an alphabet and rules for forming syntactically
correct words but not supplying a dictionary which relates these words to
the person's vocabulary.
If pressed for details we might attempt a more adequate
characterization like the following:
.BEGIN GROUP;TABIT1(15);
\%21.%* %3zero%* is an element of %dN%*.
%aII:%*\%22.%* If %3n%* is in %dN%* then the %3successor%* of %3n%* is in %dN%*.
\%23.%* The only elements of %dN%* are those created by %21.%* and %22.%*.
.END
Definition %aII%* appears to be completely at the other end of the spectrum;
it tells us very little about the appearance of the integers. It gives us a
initial element %3zero%* and a mysterious operation called %3successor%*,
which is supposed to exhibit a new element, given an old one.
And unless we are careful about the meaning of %3successor%*, definition %aII%*
will be inadequate. For example if we define the
successor of a natural number to be that same number then %aII%* is satisfied
but unsatisfactory.
.BEGIN TURN ON "#";
How should we describe %3successor%* so that our intentions are captured?
A sufficient way is to give a definition of %3successor%* as a mapping
or function from natural numbers to natural numbers.
To give such an explicit definition requires some notation: let %30%* be a
notation for %3zero%*; then
we define a function %2S%* such that the successor of
%3zero%* --#called %3one%*#-- is denoted by %2S%3(0)%*, etc.
.END
The characterization of decimal digits given in %aI%*
is syntactic.
The notation itself tells us nothing about the interrelationships between the
numbers, but it does give us a notation for representing them.
Thus %32%* can be used to represent %3two%* or
used as an abbreviation for %2S%3(%2S%3(0))%1.
One benefit of the %2S%*-notation is that it explicitly shows the
means of construction. That is, it shows more of the properties of the
these numbers than just distinguishability.
We shall refer to the digit representation as %2numerals%* and reserve the
term, %2natural number%1, for the abstract object.
Thus numerals denote, stand for, or represent the abstract objects called
natural numbers; and definition %aI%1 is better stated as: "a natural number
can be represented as a sequence of digits".
But notation and syntax are necessary and we must be able to give precise
descriptions of syntactic notions.
Given a choice between the two previous definitions, %aI%* and %aII%*,
it appears that %aII%* is more precise.
Much less is left to the imagination; given
%3zero%* and a definition of %3successor%* the definition will act as a
recipe for producing elements of %dN%*. This style of definition is called an
⊗→inductive definition↔←. The basic content of an inductive definition
of a set of objects consists of three parts:
.BEGIN GROUP;
.P117:
.BEGIN INDENT 10,10;
(1) A description of an initial set of objects; the elements of this
set are to be included automatically in the set we are describing in the
inductive definition.
.END
%aIND%*
.BEGIN INDENT 10,10;
(2) Given the description of some existing elements
in the set, we are given a means of constructing more elements.
(3) An extremal clause, saying that the only elements in the set are those
which gained admittance by either (1) or (2).
.END
.END
Clearly our definition of %dN%* in terms of %3zero%* and %3successor%*
is an instance of %aIND%*: we are defining the set of natural numbers:
%3zero%* is initially included in the set; then applying the second phrase
of the definition we can say that %3one%* is in the set since %3one%* is the
successor of %3zero%*.
We can recast the positional notation description as an inductive definition.
.BEGIN GROUP;TABIT1(15);
\%21.%* A numeral is a digit
\%22.%* if %3n%* is a numeral then %3n%* followed by a digit is a numeral.
\%23.%* The only numerals are those created by %21.%* and %22.%*.
.END
In words, "a numeral is a digit, or a numeral followed by a digit".
In this application of %aIND%*, the initial set has more
than one element; namely the ten decimal digits.
Also we assume that the questioner knows what "digit" means.
This is a characteristic of all definitions:
we must stop %2somewhere%* in our explication.
Notice too that we assume that "followed by" means juxtaposition.
Inductive definitions have been the province of mathematics for
many years; however, computer science has encroached a bit and has
developed a style of syntax specification
called BNF (Backus-Naur Form) equations which has the same intent as
that of inductive definitions. Here is the previous inductive definition
of "numeral" as a set of BNF equations:
.BEGIN TABIT1(15);
<numeral>\::= <digit>
<numeral>\::= <numeral><digit>
As an abbreviation, the two BNF equations may also be written:
<numeral>\::= <digit>|<numeral><digit>,
.END
A comparison between the BNF and the inductive descriptions of
"numeral" should clarify much of the notation, but to be complete
we give a more detailed analysis.
The symbol "::=" is to be read "is a",
the symbol "|" is to be read "or".
The strings beginning with "<" and ending with ">" stem from the
references to "numeral" and "digit" in %21%1 and %22%*;
by convention, components of BNF equations which %2describe%* elements
are enclosed in "<" and ">"; and elements which are given %2explicitly%*
are written without the "< >" fence.
Thus "<digit>" is not a numeral but is a description; to make the definition
of <numeral> complete we should include an equation like:
.BEGIN TABIT1(15);
<digit>\:: = %30 |1 |2 |3 |4 |5 |6 |7 |8 |9%*
.END
Juxtaposition of objects implies concatenation of the syntactic objects. Thus
"89" is an instance of "<numeral><digit>".
What should be remembered from the discussion in this section?
First we wanted and needed precise ways of describing the elements of our study
on data structures. We have seen that inductive definitions are a powerful
way of describing sets of objects. We have seen a variant of inductive definitions
called Backus-Naur Form equations. We will use BNF equations to describe the
syntax of our data structures and our language.
Second, we have begun to see the difference between an abstract
object and its representation. This distinction has been well
studied in philosophy and mathematics, and we will see that this idea
has strong consequences for the field of programming and computer
science in general. Representation of abstract objects will play a
crucial role in this text.
.SS(Symbolic Expressions: abstract data structures)
We now wish to apply the techniques and ideas which we have developed in
the previous section. We wish to show that careful definitions and
use of abstraction will benefit the study of data structures and LISP.
To begin our study we should therefore characterize the domain
of LISP data structures in a manner similar to what we did for numbers.
Our objects are called %2Symbolic Expressions%*.
Our domain of Symbolic Expressions is named %d<sexpr>%*.
⊗→Symbolic expressions↔← are also known as %6⊗→S-expressions↔←%*
or %6⊗→S-exprs↔←%*.
.BEGIN TURN ON "#";
The set of symbolic expressions is defined inductively over a base set
named %d<atom>%*.
The set %d<atom>%* can itself be defined inductively. We give the BNF equations
for elements of %d<atom>%* below, but the essential character of the
domain is that it contains two kinds of objects: the %2⊗→literal atoms↔←%*
and the signed numerals.
The elements of %d<atom>%* are called %2atoms%*.
.END
.GROUP SKIP 2;
.BEGIN TABIT1(15);
.GROUP;
.P115:
<atom>\:: = <literal atom>|<numeral>| -<numeral>
<literal atom>\:: = <atom letter>|<literal atom><atom letter>|<literal atom><digit>
<numeral>\:: = <digit>|<numeral><digit>
<atom letter>\:: =%3 A |B |C ...| Z%* ⊗↓We use ellipses here as a convienient abbreviation.←
<digit>\:: = %30 |1 |2 ... |9%*
.APART;
.END;
Thus a literal atom is a string of uppercase letters and digits, subject
to the provision that the %2first%* character in the atom be a letter.
.GROUP SKIP 2;
.BEGIN TABIT2(20,35);GROUP;
For example:\atoms\not atoms
%3\ABC123\2a
\12\a
\A4D6\$$g
\NIL\ABD.
\T\(A . B)
%*
.group skip 1;
.END;
The characteristics of atoms which most interest us are their distinguishability:
the atom %3ABC%* is distinguishable from the atom %3AB%*. That the string
%3AB%* is a
part of the string %3ABC%* is not germane to our current discussion⊗↓However, we
will discuss such topics in {yonss(P27)} on string processing.←.
Similarly, we will seldom need to exploit numerical relationships underlying
the numerals. At best we will use simple counting properties.
Thus most of our discussions will deal with non-numeric atoms.
Most implementations
of LISP do however contain a large arithmetic entourage.
Many implementations also
give a wider class of literal atoms, allowing some special characters to appear;
for most of our discussion the above class is quite sufficient.
The domain of Symbolic expressions, called %d<sexpr>%* is defined inductively over
the domain %d<atom>%*⊗↓We will not give the extremal clause,
but it is assumed to hold.←.
.BEGIN INDENT 15,15;
%21.%* Any element of %d<atom>%* is an element of %d<sexpr>%*.
%22.%* If %9α%41%1 and %9α%42%1 are elements of %d<sexpr>%*,
then the %2⊗→dotted-pair↔←%* %3(%9α%41%3.%9α%42%3)%1 is in %d<sexpr>%*.
.END
Thus %d<sexpr>%* includes %d<atom>%* as a proper subset.
The notation we chose for the dotted-pairs is the following:
.BEGIN INDENT 0,10;
A dotted-pair consists of a left-parenthesis followed by an
S-expr, followed by a period, followed by an S-expr,
followed by a right-parenthesis.
.END
In the definition of %d<sexpr>%* we have introduced %9α%4i%1 as
%2⊗→match-variable↔←s%*.
Greek letters %9α%* and %9β%* will be used throughout the text in several contexts
to designate pattern matches. For example, if we let %9α%41%1 be
%312%* and let %9α%42%1 be %3ABC%* then %3(%9α%41%3.%9α%42%3)%1 is
%3(12#.#ABC)%* or if we let %3(A#.(B#.#C))%* be %3(%9α%*#.#%9β%3)%1 then
%9α%1 is %3A%* and %9β%* is %3(B#.#C)%1.
Finally here's a BNF description of the full set of S-expressions.
.BEGIN TURN ON "←";
.P47:
←<sexpr> :: = <atom>|%3(%*<sexpr> . <sexpr>%3)
.END;
Notice that if we allow floating point numbers as atoms some care needs to be
exercised when writing S-expressions. How should %3(A.1.2)%* be interpreted?
Is it the dotted pair %3(A . 1.2)%*, or is it just an ill-formed expression?
Evaluation of such ambiguous constructs will depend on the implementation; such
details do not interest us yet.
.<<Examples: s-exprs not sexprs>>
.BEGIN TABIT2(20,40);
.GROUP;
Examples:\S-exprs\not S-exprs
%3
\A\A . B
\(A . B)\(A . B . C)
\(((A.B) .C) . (A.B))\((A . B)))
%*
.APART;
.END;
Recall our caveat on numerals and numbers. It also applies here. When we
described the domain %d<sexpr>%*
we picked a %2specific%* syntactic representation for its elements.
It will be a
convenient notation since it makes explicit the construction of the composite
S-expr from
its components⊗↓Just as the "successor" notation shows the construction
of the numbers from %30%*. This kind of notation will be much more useful
in LISP, since our interest in data structures will focus on the
construction process and the interrelationships between components of an
S-expr.←,
and the notation is also consistent with LISP history.
However there is more to the domain %d<sexpr>%* than syntax,
just as there is more to %dN%* than positional notation⊗↓%32%*,
II in Roman numerals, 10 in binary, "zwei" in German ... are all
representations of the same number.←.
What %2are%* the essential features of S-expressions?
Symbolic expressions are either
atomic or they have two components.
If we are confronted with a non-atomic S-expression
then we want a means of distinguishing between the
"first" and the "second" component.
The "dot notation" does this for us, but
obviously "(", ")", and "." of the dotted-pairs are simply
notation or syntax. We could have just as well represented
the dotted-pair of %3A%* and %3B%*
as the set-theoretic ordered pair, %3<A,B>%* or
any other notation which preserves the essentials of the domain %d<sexpr>%*.
.BEGIN TURN ON "#";
The distinctions between abstract objects and
their representation
are quite important. As we continue our study of more and more complex
data structures the use of an abstract data structure instead of one
of its representations can mean the difference between a clear and clean program
and a confusing and complicated program.
There are similar gains for us when we study algorithms defined over these
abstract data structures. The less the algorithm knows about the representation
of the data structure, the easier it will be to modify or understand that
algorithm. Indeed you may have
already experienced this phenomenon if you have programmed.
A program written in a high-level language is almost always more understandable
than its machine-language counterpart --#the benefits of a high-level language
are primarily notational rather than computational. The high-level program
is more abstract whereas the machine-language program knows a great deal about
representations.
Finally, if you still doubt that representations
make a difference in clarity, try doing long division in Roman numerals.
We will say much more about abstraction and representation in algorithms and
data structures as we proceed.
.END
.SS(Trees: representations of Symbolic expressions)
Besides the more conventional typographical notations, S-expressions
also have interesting %2graphical%* representations.
S-exprs have a natural
interpretation as a structure which we call a LISP-tree or L-tree.
.GROUP SKIP 1;
.GROUP;
Here are some L-trees:
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
/\ /\
L / \ R / \R
/ \ L/ /\
L /\R L/\R / L/ \R
/ \ / \ ∞3A∞* ∞3B∞* ∞3NIL∞*
∞31∞* ∞32∞* ∞3A∞* L/\R
∞3D∞* ∞3E∞*
.END
.APART;
.select 1;
We can give an inductive definition:
.BEGIN INDENT 15,15;
%21.%* Any labelled terminal node is a L-tree. A labelled terminal node
is a symbol, %9.%*, with an element of %d<atom>%* as a subscript.
For example %9.%4ABC%1.
%22.%* If %3n%41%1 and %3n%42%1 are L-trees then attaching the tails
of the arrows, %9L%* and %9R%* to %9.%* and the heads of the arrows to
%3n%41%1 and %3n%42%1 also forms an L-tree.
.END
Most important: there are no intersecting branches. We will talk about more
general structures later (called list-structures or directed graphs).
You can see how to interpret S-exprs as L-trees. The atoms are
interpreted as terminal nodes; and since non-atomic S-exprs always
have two sub-expressions we can write the first
subexpression as the left branch of an L-tree and the second sub-
expression as the right branch.
Typically we, leave off the %4L%* (left) and %4R%* (right) subscripts since it is clear
from context which they are.
For example:
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 30,55;
∞3(A . B)∞* ?∞3(A . (B . C))∞* ?∞3((A . B) . C)∞*
/\ ? /\ ? /\
/ \ ? / \ ? / \
∞3A∞* ∞3B∞* ? / /\ ? /\ \
? / / \ ? / \ \
? ∞3A∞* ∞3B∞* ∞3C∞* ? ∞3A∞* ∞3B∞* ∞3C∞*
.END
.GROUP
.SELECT 1;
.P102:
Other representations of LISP-trees which will be more suggestive when
we talk about implementation of LISP are:
%3
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃ ⊂αααπααα⊃
ααααα→~ # ~ #αβααααααααααα→~ # ~ # ~
%αβα∀ααα$ %αβα∀αβα$
↓ ↓ ↓
∞3A∞* ∞3B∞* ∞3C∞*
.END
.APART
.GROUP
%1or equivalently:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃ ⊂αααπααα⊃
ααααα→~ A ~ #αβααααααααααα→~ B ~ C ~
%ααα∀ααα$ %ααα∀ααα$
.END
.APART
.SELECT 1;
This representation (for the S-expr %3(A .(B . C))%1 ) is called %2⊗→box-notation↔←%*.
Again please keep in mind the distinction between the abstract object
referred to as a symbolic expression and the several representations
which we have shown.
The question of representation is so important and will occur so frequently that
we introduce notation for a representational mapping, %9r%*.
To represent domain %dD%1 in domain %dE%1, we will define a
function %9r%4D→E%1 which usually will be inductively given, and
will express the desired mapping.
For example a representational mapping %9r%4<sexpr>→L-tree%1 can be given:
.BEGIN CENTERIT;
←%9r%f(%1<atom>%f)%1 = %9.%4<atom>%1
←and for %9α%* and %9β%* in %d<sexpr>%*:
.END
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "?" FOR "\"
.TURN ON "∞" FOR "%";
.TABS 25,37;
?∞9r∞f(∞3(∞9α∞* . ∞9β∞*)∞f)∞1 =∞G? /\
?? / \
?? / \
??∞9r∞f(∞9α∞f)∞g ∞9r∞f(∞9β∞f)∞g
.END
Typically context will determine the appropriate subscript on the %9r%1-mapping;
thus we will omit it.
.REQUIRE "PROB1.PUB" SOURCE_FILE;
.SS(Primitive Functions)
%1
So far we have described the domain of abstract objects called Symbolic
Expressions and have exhibited several representations for these objects.
We will now describe some functions or operations to be performed on this domain.
We need to be a bit careful here. We are about to see one of the main
differences between mathematics and computer science: mathematics
emphasizes the idea of function; computer science emphasizes the idea of
algorithm, process, or procedure.
.BEGIN TURN OFF "{,}";TURN ON "#","←";
What is a function? Mathematically a function is simply a
mapping such that for any given argument in the domain of the function there
exists a unique corresponding value.
In elementary set theory, a definition of function %3f%* involves saying that %3f%*
is a set of ordered pairs %3f#=#{#<x%41%*,#y%41%*>,#...}%1; the %3x%4i%1's are
all distinct and the value of the function %3f%* for an argument %3x%4i%1 is
defined to be the corresponding %3y%4i%1.
With either definition no rule of computation is given to help
locate values; with the first definition it is implicit that the internal structure
of the mapping doesn't matter; in the set-theoretic definition, the correspondence
is explicitly given.
An algorithm or procedure is a process for
computing values for a function. The factorial function, %3n!%*,
can be computed by many different algorithms; but thought of as a function it is
a set
←%3{<0,1>, <1,1>, <2,2>, <3,6>, ...<n,n!>, ...}%1
.END
For the initial rash of primitives, the distinction between function and algorithm
will not be too apparent. However we will soon remedy that situation.
.BEGIN TURN ON "#";
Now for some terminology:
the %2domain%1 of a function is the set of all values for which the function is
defined; the %2range%1 of a function is the set of all values which the function
takes on.
A careful definition of a function, %3f%*, requires specification of a
further set; for lack of a better name, call it the %2domain of discourse%1.
The domain of discourse, named %aD%*,
consists of all possible values which may occur
as the argument to a function.
If the domain of a particular function %3f%* coincides with %aD%* then %3f%*
is said to be a %2⊗→total function↔←%* (over %aD%*); if there are
elements of %aD%* which are not in the domain of %3f%* then %3f%* is a
⊗→partial function↔←, and %3f%* is said to be undefined for those values.
For example, the factorial function is typically considered to be partial over
the integers, total for the natural numbers, and undefined for negative integers.
Thus the concept of "total" or "partial" is relative to a specified
domain of discourse.
However, a function %3f%1 total over a domain %aD%41%1 can be extended to be
total over a domain %aD%41%1∪%aD%42%1 by assigning values to %3f(d)%1 for
%3d%9ε%aD%42%1. Thus factorial can be extended to be total over the integers
by defining %3n!%1 to be %30%1 for %3n%1 less than %30%1.
We may also extend the range of a function when we extend the domain; thus
%3f(d)%1 need not be in the range of the original %3f%1.
For example, we added %30%1 to the range when we extended the factorial
function. When we extend the range
we must specify what additions are made.
A substantive decision needs to be made on how we are to handle partial functions.
Since we are attempting to be reasonably realistic about our modelling of
computation we should be as precise as possible in our formalism.
We could introduce a class of error values and include them in the range of
%3f%*; these values would be given as the result of applying %3f%*
to an argument not in its domain; or
we could simply say that the result is "unspecified"
⊗↓Indeed, how "unspecified" manifests itself on a machine will depend on the implementation.
Sometimes error messages are given; sometimes it results in an excursion into
the subconscious.←.
We shall pick an intermediate position;
we shall introduce %2one%* new element,
%9B%*, called "unspecified" or "undefined", or "bottom"⊗↓"bottom" is
sometimes written %9W%*.←.
We will define all our functions
over domains augmented with this element;
thus constructs like %3f(%9B%*)#=#a%1 are allowed.
Think of %9B%1 as covering all anomalous conditions which could be detected
and printed as error messages.
.P181:
As we define new data structures we will frequently want to extend
our functions to larger domains. For most of our purposes,
a function %3f%* defined on (an augmented domain) %aD%* will be extended to
a larger domain,
%aD%1∪%aD%41%1, by defining %3f(d%41%*)#=#f(%9B%*)%1 for
.P183:
%3d%41%9ε%aD%41%1⊗↓The
exception to this extension convention involves the definition of predicates
which can tell whether or not an arbitrary element is in a specified domain.
These predicates always give true or false when applied to any element other than
%9B%1.←.
Note that %3f(%9B%3)%1 need not be %9B%1;
however
many of the functions which we will examine
are defined such that %3f(...,#%9B%*,#...)#=#%9B%1; that is %3f%* returns
the undefined value whenever any of its arguments are undefined.
Functions which possess this property are called %2⊗→strict functions↔←%1.
.END
To apply this discussion of %9B%* to S-exprs we will define an
extended domain %aS%* to be:
.BEGIN CENTER;TURN OFF "}","{";
%AS%* = %d<sexpr>%*#∪#{%9B%1}.
.END
Then we can talk about functions which are total over %aS%* or
total over %d<sexpr>%*,
and we will talk about functions which are partial over %d<sexpr>%*.
Typically when we ask if a function of symbolic expressions is partial or
total without specifying
a domain, we are asking the question over the natural, unextended domain, %d<sexpr>%*.
The first LISP function we consider is the
⊗→%3cons%*↔← function which is used to generate S-exprs from less
complicated S-exprs. %3cons%* is called a %2⊗→constructor↔←%*-function and
is a strict, binary function; it is a total function
over the domain %d<sexpr>%*⊗↓We we really mean is that
%2either%1 argument to %3cons%1 can be an arbitrary S-expr.←.
Thus %3cons%* expects two arguments and
gives %9B%* as value whenever either of its arguments is
%9B%*.
Whenever %3cons%* is presented with two elements %9α%* and %9β%* of %d<sexpr>%*,
%3cons[%9α%*;%9β%*]%1 returns a new S-expr %3(%9α%*#.#%9β%*)%1. That is,
interpreted as a LISP-tree, %3cons[%9α%*;%9β%*]%1 has a left branch %9α%*
and has a right branch %9β%*.
.GROUP;
For example:
.BEGIN CENTERIT;SELECT 3;
←cons[A; B] = (A . B)
←cons[(A . B); C] = ((A . B) .C)
.END;
.APART
Expressions like the above, which can be evaluated, are called %2⊗→forms↔←%1.
S-exprs are forms since they are the constants of our language:
the value of a constant is that constant.
Function applications are forms: the value is the result of
performing the designated function on the
designated arguments.
.P127:
Notice that we are designating function application in LISP by
"function name, followed by a list of arguments delimited by `[' and
`]'⊗↓The syntax equations for forms are given
on {yon(P66)}.←." The `[...]'-notation is part of the LISP syntax and we will reserve
`(...)'-notation for the function application of mathematics.
In a few places in our discussions the distinction will be important.
Typically the distinctions will occur when we wish to distinguish between
the LISP algorithm and the mathematical function computed by that algorithm.
.<<car and cdr>>
We have two strict, unary %2⊗→selector↔←%1 functions, %3car%* and %3cdr%*⊗↓These strange
names are hold-overs from the original implementation of LISP on an ancient
IBM 704. That machine had partial-word instructions to reference the
%da%1ddress and %dd%1ecrement parts of a machine location. The %3a%* of
%3car%* comes from "address", the %3d%* of %3cdr%* comes from "decrement".
The %3c%* and %3r%* come from "contents of" and "register".
Thus %3car%* could be read "contents of address part of register".←,
for traversing LISP-trees.
We already know the meaning of "strict"; a unary function expects %2one%* argument;
and a selector function is a data structure manipulating function which
will select a component of a composite data structure.
Both %3car%* and %3cdr%* are selectors since they will select components of
non-atomic elements of %d<sexpr>%*.
Thus %3car%* and %3cdr%* are both ⊗→partial functions↔← over %d<sexpr>%*;
they give values in %d<sexpr>%* only for non-atomic arguments;
they give %9B%* whenever they are presented with an atomic argument.
When given a non-atomic argument, %3(%9α%* . %9β%*)%1,
⊗→%3car%*↔← returns as value the first subexpression, %9α%1;
⊗→%3cdr%*↔←
(pronounced could-er) returns as value the second sub-expression %9β%1.
.GROUP;
For example:
.BEGIN CENTERIT;
←%3car[(A . B)] = A car[A] = %9B%*
←%3cdr[(A . B)] = B cdr[(A .(B . C))] = (B . C)
←car[((A . B) . C)] = (A . B)
%*
.END;
.APART
.GROUP;
As with most mathematical theories, we will allow ⊗→functional composition↔←.
The composition of two unary functions %3f%* and %3g%* is another
function, sometimes denoted by %3f%fo%*g%1. The value of an expression, %3f%fo%*g[x]%1,
is the value of %3f[g[x]]%*.
That is, the value of %3f%fo%*g[x]%1 is a %3z%*
such that %3y%* is the value of %3g[x]%*
and %3z%* is the value of %3f[y]%*. %3f%fo%*g%1 may be undefined for several reasons:
%3g[x]%* may be undefined, or %3f[y]%* may be undefined.
.APART
.GROUP
Here are some examples of composition:
.BEGIN CENTERIT;
%3
←car%fo%*cdr[(A .(B . C))] = car[cdr[(A .(B . C))]] = car[(B . C)] = B
←cdr%fo%*cdr[(A .(C . B))] = cdr[cdr[(A .(C . B))]] = cdr[(C . B)] = B
←cdr[cdr[A]] = %9B%*
←car[cdr[(A . B)]] = %9B%*
←car[cons[X;A]] = X cdr[cons[Y;X]] = X .
%*
.END;
.APART
Notice that the compositions which give result %9B%* do so by resorting
to our characterization of %3car%* and %3cdr%* as strict functions which
have been extended to %aS%*.
The composition of many %3car%* and %3cdr%* functions occurs so frequently that an
abbreviation has been developed.
Given such a composition, we select in left-to-right order,
the relevant %3a%*'s and %3d%*'s in the %3car%*'s and %3cdr%*'s. We sandwich this
string of %3a%*'s and %3d%*'s between a left-hand %3c%* and a right-hand %3r%*
and give the composition this name.
.BEGIN CENTERIT;GROUP;
For example:
%3
←cadr[x] <= car[cdr[x]]
←caddr[x] <= car[cdr[cdr[x]]]
←cdar[x] <= cdr[car[x]]
%*
.END;
These compositions are also called ⊗→%3car-cdr-%2chains%1↔←, and are useful in
traversing LISP-trees. The "<="- notation is to be read "is defined to be the
function ...".
This notation is only a temporary convenience and not part of LISP.
Very soon we will study what is involved in giving and using definitions in LISP
({yonss(P49)}).
For the moment intuition should suffice.
Notice that with these definitions we have introduced variables (over
S-exprs). In the sequel lower-case
identifiers⊗↓See {yon(P66)} for the BNF equations for <identifier>.← will be
used freely as variables. So for example %3Y%* and %3CAR%1 are atoms; %3y%*
and %3car%1 could be used as variables.
Be clear on the distinction between LISP variables like %3x, y%* or %3foo%*,
and the match variables⊗↓%1also called meta-variables← like %9α%* or %9β%*.
For %9α%* and %9β%* ranging over S-expressions, %3(%9α%*#.#%9β%*)%1 is a
well formed S-expr. The construction, %3(x#.#y)%*, is %2not%*
well-formed, but %3cons[x;y]%1 is correct.
.<<discussion of components of function def>>
It %2is%* useful now, however, to introduce some terminology for talking
about the components of a function definition. Let
.BEGIN CENTERIT; SELECT 3;
←f[x%41%*; ...; x%4n%*] <= %9x%3
.END
represent a typical definition. The %2name%* of the function is %3f%*;
the %2body%* of the function is the expression %9x%1.
The list of variables appearing after the function name
are called the %2⊗→formal parameters↔←%*.
A function is %2applied%* using the common
notation of %2⊗→function application↔←%1:
.BEGIN CENTERIT; SELECT 3;
←f[a%41%*; ...; a%4n%*]
.END
The %3a%4i%1's are called %2⊗→actual parameters↔←%1; for an
application to be well formed, the actual parameters must agree in
number with the formal parameters of the definition of %3f%* and they
are to be associated in a one-for-one order, %3a%4i%1 with %3x%4i%1.
Thus in the expression %3car[cdr[(A#.#B)]]%* the actual parameter
to the %3car%* function is %3cdr[(A#.#B)]%*, and the actual parameter to
%3cdr%* is %3(A#.#B)%*.
How the actual parameters are to be treated will be a large part of our
study.
.BEGIN TURN ON "#";
Once again we are getting a hint of differences between mathematics
and computer science.
Mathematically speaking, a composition of functions is simply another function
--#i.e.,#a mapping#-- and therefore nothing need be said about how to compute
composed functions. From a computational point of view,
we want to express evaluation
of expressions involving composed functions in terms of the evaluation of
subexpressions. This would allow us to describe a complex computation in terms
of an appropriate sequence of subsidiary computations.
One of the more natural ways to evaluate expressions involving compositions is
to evaluate the inner-most expressions first, then work outwards.
Assume arguments to multi-argument
functions are evaluated in left-to-right order. Thus:
.END
.BEGIN SELECT 3;GROUP;TABIT2(10,45);
\cons[car[(A . B)];cdr[(A . (1 . 2))]]\= cons[A;cdr[(A . (1 . 2))]]
\\= cons[A;(1 . 2)]
\\= (A .(1 . 2))
.END
Evaluation may seem to be a simple operation but
looks can be deceiving; evaluation is a very complex process.
The value of an expression may depend on the %2order%* in which we do things.
.P106:
For example consider
the evaluation of %3foo[car[A];#B]%1 where %3foo[x;y]#<=#y%1.
If we expect %3foo%* to be a strict function, then %3foo[car[A];#B]%1
must return %9B%*
even though it is reasonable to believe that the value of the computation should
be %3B%1 since %3foo%* does not depend on the value of its first parameter.
It appears that if we postponed the evaluation of the arguments until
those values were actually needed, then at least this problem would be solved.
However,
the consequences of defining a function to be strict
are severe; they cannot be sidestepped by resorting to different schemes
for evaluating arguments. There is an alternative strategy in assigning
strictness: we could examine the body of the function; if the function uses
all its parameters, then its strict. If the function doesn't depend on one
or more paramters, then its non-strict. Thus with this interpretation, %3foo%1
%2is%1 non-strict. We prefer the initial interpretation,
reasoning that, if a function is passed bad information, then we wish to
know about it, even if the function does not use that specious result.
Strictness is closely related to evaluation schemes for parameter passing.
Here are two common techniques:
.BEGIN TABIT1(7);
.P121:
%aCBV%*\Evaluate the arguments to a function; pass those evaluated arguments to the function.
.END
.BEGIN TURN ON "#";
This scheme, called %aC%1all %aB%1y %aV%1alue, is what we were informally using to evaluate
the previous examples.
An alternative evaluation process is %aC%1all %aB%1y %aN%1ame:
.BEGIN TABIT1(7);GROUP;
.P126:
%ACBN%*\Pass the unevaluated arguments into the body of the function.
.END
Assuming %3foo%1 is defined to be strict, then
%3foo[car[A];#B]%1 yields %9B%1 under either %aCBV%1 or %aCBN%1.
However if we define %3foo%1 to be
non-strict then %aCBV%1 and %aCBN%1 will both give value %3B%1.
With %aCBV%1, %3x%1 is bound to %9B%1; while with %aCBN%1 %3x%1 is bound to %3car[A]%1.
Further relationships between evaluation schemes and strictness
will be investigated.
On {Yon(P112)} we discuss non-terminating
computations.
In {yonsec(P113)} we will discuss evaluation techniques
and will give a precise characterization of the evaluation of LISP expressions.
On {yon(P40)} we will introduce a non-strict language construct
but,
until that time, intuitive application of %aCBV%1 will suffice.
What should be kept in mind from this discussion is that we must be
careful when discussing the process of evaluation; the function we are
characterizing by computing its values will often depend on our choice
of evaluation scheme.
.END
Before introducing a further class of LISP expressions we summarize
the
syntax of the LISP expressions (or forms) allowed so far:
.BEGIN TABIT1(17);GROUP;
.P66:
<form>\::= <constant> | <function-part>[<arg>; ...;<arg>] | <variable>
<constant>\::= <sexpr>
<function-part>\::= <identifier>
<arg>\::= <form>
<variable>\::= <identifier>
<identifier>\::= <letter> | <identifier><letter> | <identifier><digit>
<letter>\::= %3a | b | c ... | z
.END
.P130:
The use of ellipses in the last equation is an abbreviation we have seen before.
The use of ellipses in the first equation is different. It is
an abbreviation meaning "zero or more occurrences".
Thus the equation means a <form> is a <function> followed by the symbol "["
followed by zero or more <arg>'s followed by the symbol "]".
This use of ellipses can always be replaced by a sequence of BNF equations.
for example, this instance can be replaced by:
.BEGIN TABIT1(17);
<form>\::= <constant> | <function-part>[<arg-list>] | <function-part>[ ] | <variable>
<arg-list>\::= <arg> | <arglist>;<arg>
.END
To increase lucidity
we will frequently violate these syntax equations, allowing function names
containing special characters, e.g. %3fact*%*, %3fib%9'%1 or + ; or writing %3x+y%*
instead of %3+[x;y]%*. No attempt will be made to characterize these violations;
occurrences of them should be clear from context.
.GROUP;
Notice that the class <⊗→form↔←> is a collection of LISP expressions which can
be evaluated. A <form> is either:
.BEGIN INDENT 5,5;
%21.%* a constant: the value is that constant.
%22.%* a function name followed by zero or more arguments: we've said a bit about
evaluation schemes for these constructs.
%23.%* a variable: a variable in LISP will typically have an associated value in some
environment.
.END
.APART
Again we will wait to {yonss(P49)} for a precise description.
.P111:
An important constraint on LISP forms which is not covered by the syntax
equations is the requirement that functions are defined as being n-ary for
some %2fixed%* n. Any n-ary function
must have %2exactly%* n arguments presented to it whenever it is applied.
.P95:
Thus %3cons[A], cons[A;B;C], %*and %3car[A;B]%* are all ill-formed expressions
and therefore denote %9B%1.
.CENT(Problems)
I. Discuss %3cons[car[x];cdr[x]] = x%1.
II. Discuss %3cons[car[%9α%*];cdr[%9α%*]] = %9α%1.
.SS(Predicates and Conditional Expressions,conditional expression)
.BEGIN TURN ON "#";
We cannot generate a very exciting theory based simply on %3car, cdr,%* and %3cons
%*
with functional composition. Before we can write reasonably interesting
algorithms we must have some way of performing conditional actions.
To do this we first need ⊗→predicates↔←. A LISP predicate is a function
returning a value representing truth or falsity. We will represent the
concepts of true and false by %et%* and %ef%* respectively. Since these truth
values are distinct from elements of %aS%*, we will set up a new domain %aTr%*
which will consist of the elements, %et%* and %ef%*. As usual
the extra element %9B%1 is
included so that we may talk about partial predicates just as we
talked about partial functions on %d<sexpr>%*.⊗↓A word for the
inveterate LISP hacker: our use of %et%* and %ef%* marks our first major
break from current LISP folklore.
The typical LISP trick is to use the atoms %3T%1 and %3NIL%1
rather than %et%1 and %ef%1 as truth values.
Our heresy will disallow some mixed compositions of LISP
functions and predicates. We shall not apologize for that
and by the end of this chapter you should see
why we have deviated.←.
.END
LISP has two primitive predicates.
The first is a strict unary predicate named ⊗→%3atom%*↔←;
%3atom%* is total over %d<sexpr>%*, and
is a special kind of predicate called a %2⊗→recognizer↔←%1 or a %2⊗→discriminator↔←%1.
Recognizers are used to determine the type of an instance of a data structure.
Thus %3atom%* will return %et%* if the argument denotes an atom, and will return %ef%*
if the argument is a non-atomic S-expr.
.GROUP SKIP 1;
.BEGIN CENTERIT;
.GROUP
%3
←atom[A] = atom[NIL] = %et%*
←atom[(A . B)] = %ef%*
←atom[car[(A . B)]] = %et%*
←atom[%9B%3] = %9B%3
%*
.APART
.END
What should we do about the value of constructs like:
%3cons[atom[A];#A]%1? %3atom[A]%1 gives %et%1, but %et%1 is not an element of %aS%* and
thus is not appropriate as an argument to %3cons%1. Using our argument of {yon(P181)}
we extend the domains of the S-expr primitives to
.P184:
.BEGIN CENTER;
%aS%41%1 = %aS%1∪%aTr%1
.END
.BEGIN CENTERIT;SELECT 3; GROUP;
%1For example:%3
←car[s] = car[%9B%3], cons[s; A] = cons[%9B%3; A]%1 for %3s%9ε%aTr
.END
Since all those functions were strict with respect to undefined we have:
.BEGIN CENTERIT;SELECT 3; GROUP;
←atom[%Et%3] = %9B%3
←cons[%9B%3; A] = %9B%1
.END
Notice that we now have %2two%* separate domains: S-expressions and truth values.
Since we will be writing functions over several domains we will need a
general recognizer for each domain to assure that the operations defined on each
abstract data structure are properly applied. Thus we introduce the recognizer
%3issexpr%* which will give %et%* on the domain of S-exprs, and will
give %ef%* for any element other than %9B%1⊗↓Recall our footnote on {yon(P183)}.←.
.BEGIN CENTERIT;SELECT 3;GROUP;
←issexpr[(A . B)] = issexpr[A] = %et%*
←issexpr[%et%*] = %ef%*
←issexpr[%9B%*] = %9B%*
.END
The another primitive predicate we need is named ⊗→%3eq%*↔←.
It is a strict binary predicate, partial over the set %d<sexpr>%*;
it will give a truth value only if its arguments are both atomic.
It returns %et%* if the arguments denote the
same atom; it returns %ef%* if the arguments represent different atoms.
%3eq%1 is extended to %aS%41%1 to yield %9B%1 if either argument to %3eq%1
denotes an element not in the set %d<atom>%1.
.BEGIN CENTERIT;
%3
.GROUP SKIP 1;
.GROUP
←eq[A;A] = %et%* eq[A;B] = %ef%*
←eq[(A . B); A] = %9B%3 %3eq[(A . B);(A . B)] = %9B%3
←eq[eq[A;B];D] = %9B%3 eq[%9B%3;x] = %9B%3
←eq[car[(A . B)];car[cdr[(A .(B . C))]]] = %ef%1
.APART;
.END
For completeness' sake we should define a version of %3eq%*, say %3eq%4Tr%1, which
is defined over %aTr%* and acts like %3eq%*. For expediency's sake we will
simply extend the definition of %3eq%* to %aS%41%1 so that it may compare two elements of
%aTr%*.
.BEGIN CENTERIT;
%3
.GROUP
←eq[%et%*;%et%*] = %et%* eq[%ef%*;%9B%3] = %9B%3
←eq[%ef%*;%ef%*] = %et%* eq[%et%*;%ef%*] = %ef%*
←eq[A;%et%*] = %9B%1
.APART;
.END
We need to include a construct in our
language to effect a test-and-branch operation.
The IF-THEN-ELSE operation in LISP is called the conditional expression. It
is written:
.BEGIN CENTERIT;
%3
←[p%41%* → e%41%*; p%42%* → e%42%*; ... ; p%4n%* → e%4n%*]
%1
.END
.P40:
Each %3p%4i%1 is a predicate and therefore takes on values in the
set %aTr%* or gives %9B%1;
each %3e%4i%1 is an expression which will give a value in
%aS%41%1. We will restrict the conditionals such that all the %3e%4i%1
must have values in the %2same%1 domain or be %9B%1; i.e. all be in %d<sexpr>%1
or all be in %aTr%1.
The meaning (or semantics) of conditionals is:
.BEGIN INDENT 10,10;group
We evaluate the %3p%4i%1's from left to right, finding the %2first%*
which returns value %et%*. When we find such a %3p%4i%1,
we evaluate the corresponding
%3e%4i%1.
The value of the conditional expression is the value computed by that %3e%4i%1;
if all
of the %3p%4i%1's evaluate to %ef%* then the conditional expression gives %9B%1.
The conditional expression also gives %9B%1 if we come across a %3p%4i%1
which has value %9B%1 before we hit a %3p%4i%1 with value %et%1.
.END
.BEGIN CENTERIT;
.GROUP;
Examples:
%3
←[atom [A] → B; eq [A;(A . B)] → C] = B
←[eq [A;(A . B)] → C; atom [A] → B] = %9B%3
←[atom [(A . B)] → B; eq [A ; B] → C; eq [car[(A . B)]; cdr[(B . A)]] → E] = E
←[eq [A; A] → %et%3; atom [A] → %ef%3] = %et%3
←[eq [A; A] → %et%3; atom [A] → %3B%3] = %9B%3
%1
.APART
.END
Notice that the p%42%* expression of the first example is undefined, but
the conditional gives value %3B%* since p%41%* gives value %et%*.
Thus conditional expressions are non-strict. Note however that non-strictness
is relative to a single domain; thus the last example above gives %9B%1 since
it contains %3e%4i%1's of differing domains.
.BEGIN TURN ON "#";
Frequently it is convenient to use a special form of the conditional expression
where the final %3p%4n%1 is guaranteed to be true.
You can think of lots of predicates which
are always true (for example, %3eq[1;1]%* ). A natural predicate is the
constant predicate, %et%*; the value of %et%* is always %et%*.
.END
.GROUP
Thus the special form:
.BEGIN CENTERIT;
%3
←[%3p%41%* → %3e%41%*; ...; %et%* → %3e%4n%*]
%1
.END
.BEGIN TURN ON "#";
Now if we know that the previous
%3p%4i%1's are either true or false⊗↓and we know that all the %3e%4i%1's are
elements of the same domain.←,
the final %3p%4n%1#→#%3e%4n%1-case is a catch-all
or otherwise-case which will be executed if none of the previous %3p%4i%1's
give %et%*.
Thus the use of %et%* in this context can be read "otherwise";
and the conditional can be read:
.END
.BEGIN CENTERIT;
←"If %3p%41%1 is true then %3e%41%1, else if %3p%42%1 is true then ..., otherwise %3e%4n%1."
.END
.APART;
.BEGIN TURN ON "#";
.P17:
The introduction of conditional expressions has further widened the gap between
traditional mathematical theories and computational theories.
Previously we could almost side-step the issue of order of evaluation;
it didn't really matter unless %9B%* got into the act. But now
the very definition of meaning of conditionals involves an order of evaluation.
First, the order of evaluation is important from a computational
viewpoint on the grounds of efficiency: if we are going to give as value
the leftmost %3e%4i%1 whose %3p%4i%1 evaluates to %et%*, then there is
no need to compute any of the other %3e%4j%1's; those values will never be used.
A more pressing difficulty is that of partial functions. If we did not
impose an order of evaluation on the components of a conditional, then
frequently we would attempt to evaluate expressions which would lead
to undefined results: %3[eq[0;0]#→#%31%*;#%et%*#→#car[A]]%1 gives %31%*
using the meaning of conditionals, whereas the expression would be undefined
if we were forced to evaluate %3car[A]%*. This would cause us to lose
unnecessarily if we think of an occurrence of %9B%1 during evaluation being
mapped to an error message, and causing termination of the computation.
But, if we continue to allow
%9B%* as an argument or value to a function, then we can characterize
the effect of a conditional expression as a %2⊗→non-strict↔←%* function.
Recall, a non-strict function is allowed to return a value other than %9B%*
when one of its arguments is %9B%*; or, put another way, we don't examine
the definedness of arguments before applying the function.
.END
For example,
let %3if(x;y;z)%1 be the %2conditional function%1 ⊗↓Notice we are
writing `(...)' rather than `[...]' since we are talking
about the function and not the algorithm. See {yon(P127)}.←
computed by: %3[x#→#y;#%et%*#→#z]%1.
We can define %3if%* as a non-strict function
such that:
.BEGIN TABIT1(15);GROUP;
\%3y%* if %3x%* is %et%*
%3if(x;y;z) = \z%1 if %3x%* is %ef%*
\%9B%* if %3x%* is %9B%*
.END
.P112:
However there is more to the "strictness" implied by conditional
expressions than just making sure that proper arguments are passed on function
calls.
.group
Consider the following algorithm:
.BEGIN CENTERIT; SELECT 3;
←f[x] <= [x=0 → 1; %et%* → f[x-1]]
.END
.apart
Assume the domain of discourse is the positive and negative integers; and
assume that %3f%* is non-strict. That is,
%3f%1 will try to compute with %2any%1 (integer) argument it is given.
This algorithm defines a function giving %31%* for any non-negative integer and
is undefined for any other number. From a computational point of view, however,
%3f[-1]%* appears "undefined" in a different sense from %3car[A]%* being
"undefined".
The computation %3f[-1]%1 does not terminate and is said to %2diverge%1.
For a partial function like %3car%*, we can conceive
of giving an error message whenever we attempt to apply the function
to an atomic argument. But it stretches one's credulity to expect
to include tests like "if the computation %3f[a]%* does not terminate then
give error No. 15."⊗↓Indeed, there are good reasons to be sceptical about
the existence of such omniscient tests.←. From the purely
functional point of view, %3f%* still defines the partial function
which is %31%* for the non-negative integers, regardless of how badly
it botches things up for negative numbers.
So a computation may be "undefined" for two reasons:
it involves a non-terminating computation or it involves applying a
partial function to a value not in its domain.
Note that the distinction between "undefined" and "diverges" is fuzzy. If we
restrict the domain of the above %3f%1 to the natural numbers, then %3f[-1]%1
denotes %9B%1 rather than diverges. Or, put another way, "undefined strictness"
is a special case of "divergent strictness" where we are able to predict
which computations will not terminate. Those cases can be checked by
defining the function to be strict over a domain which rules out those anomalies.
Thus a case can be made for identifying divergent computations
with %9B%1; however there is typically more to
non-termination that just "wrong kind of arguments applied".
We want to extend our discussion of strictness to encompass
divergence. Recall the discussion on {yon(P106)} of
.BEGIN CENTERIT;SELECT 3;
←foo[x;#y]#<= y.
.END
Defining %3foo%1 to be strict required that each application of %3foo%1
determine whether either argument denoted %9B%1.
If we want %3foo%1 to be strict with respect to divergence, then
we must test each argument for divergence. That implies evaluation of the
each of the arguments,
which in turn implies that if a computation of an argument diverges, then the computation
of the function application must also diverge.
This implies that it is natural to associate "strict with respect to divergence"
with %aCBV%1, since in the process of checking for termination, we must compute
values. However we already know that if a function is strict then
calling style doesn't matter.
In contrast, a non-strict function does not check arguments for divergence, and
indeed the divergence of a computation may depend of the calling style.
Consider the evaluation of %3foo[f[-1];#B]%1 where %3f%1 was defined above and is
total over the integers. This evaluation will diverge under %aCBV%1 while it
converges to %3B%1 using %aCBN%1.
We cannot restrict all our functions to be strict if we expect to do any
non-trivial computation.
That is, we need a function which can determine its value without complete information
concerning the values of all of its arguments --a "don't care condition"--.
If all the arguments to all subfunctions must be evaluated
the computation will not terminate.
.P180:
The conditional function is such a non-strict function.
That is %3if(%et%*;q;r)%1 has value %3q%* without
knowing anything about what happens to %3r%*.
In particular,
%3if(%et%*;q;%9B%*)#=#q%1.
Likewise %3if(%ef%*;%9B%*;r)#=#r%1.
Now since %3if%* is to be a function and therefore single-valued,
if %3if(%et%*;q;%9B%*)#=#q%1 then for any argument %3x%*,
%3if(%et%*;q;x)#=#q%1.
Thus %3if(%et%*;x;y)%1 and %3if(%ef%*;y;x)%1 act like functions of the
form:
.BEGIN CENTERIT SELECT 3;
←f(x;%9B%*) = z %1 implies for %2every%1 %3y%* in the domain %3f(x;y)=z%1.
.END
Such functions are said to be %2⊗→monotonic functions↔←%1.
Notice that %9B%* is now carrying an additional "don't-care" interpretation,
certainly consistent
with its previous assignments when we think of the function being computed
by the algorithm.
Even given that a computational definition is desired,
there are other plausible interpretations of conditionals.
Consider the nonsense definition:
%3g[x;y]#<=#[lic[x]#→#1;%et%3#→#1]%1. Clearly, assuming that %3lic%* is a total predicate,
any value computed by %3g%* will be %31%*. But requiring left-to-right evaluation
could spend a great deal of unnecessary computation if %3lic%* is a
%dl%*ong %di%*nvolved %dc%*alculation.
Questions of evaluation are non-trivial. You should at least be aware that some
decisions have been made and others were possible.
.P131:
What benefits have resulted from our study of %9B%* and divergence?
We should have a clearer understanding of the difference between
function and algorithm and a better grasp of the kinds of difficulties
which can befall an unwary computation.
We have uncovered an important class of detectable errors.
The character of these miscreants is that they occur in the
context of supplying the wrong %2kind%* of argument to a function. This
kind of error is called a %2⊗→type fault↔←%*, meaning that we expected
an argument of a specific type, that is from a specific domain,
and since it was not forthcoming, we
refuse to perform any kind of calculation.
Thus %3atom[%ef%*]%1 and %3cons[%et%*;A]%1 are undefined
since both expect elements of %aS%* as arguments. See {yon(P132)} for further
discussion of type faults.
Divergent computations are equally repugnant but there is no general
method for testing whether an arbitrary calculation will terminate.
It may not seem like you can do much useful computation with such
a limited collection of operations as those proposed in LISP.
Things are not quite as trivial as they might seem.
In elementary number
theory all you have is zero and some simple functions, and elementary number theory
is far from elementary.
Manipulation of our primitives, with composition, and conditional expressions,
coupled with techniques for definition can %2also%* become complicated.
Let's apply the LISP constructs which we now have, and define
a new LISP function.
For example: our predicate %3eq%* is defined only for atomic arguments. We would like
to be able to test for equality of arbitrary S-exprs.
What should this more complex equality mean?
By equality we mean: as trees, the S-exprs have the same branching structure; and
the corresponding terminal nodes are labeled by the same atoms.
Thus, we would
like to define a predicate, ⊗→%3equal%*↔←, such that:
.GROUP SKIP 1;
.GROUP
.BEGIN CENTERIT;
%3
←equal [(A . B);(A . B)] = %et%* = equal [A;A]
←equal [(A . B);(B . A)] = %ef%*
←equal [(A . (B . C));(A . (B . C))] = %et%*
←equal [(A . (B . C));((A . B) . C)] = %ef%*
%1
.END
.APART
Here's an intuitive description of such a predicate named %3equal%*.
.BEGIN INDENT 0,4;
1. If both arguments are atomic then see what %3eq%* says about them
(are they "%3eq%*"). We can test if they are both atomic by using %3atom%*
and a conditional expression.
.END
2. If one is atomic and the other is not they can't be equal S-exprs.
.BEGIN INDENT 0,5;
3. Otherwise both are non-atomic S-exprs. Both have two sub-expressions.
Look at both first subexpressions. If the first sub-expressions are not
equal then certainly the initial expressions cannot hope to be equal.
If, however, the first subexpressions are equal then the question of
whether or not the initial expressions are equal depends on the
equality (or non-equality) of the second subexpressions. Thus the
following definition:
.END
%3
.GROUP SKIP 1;
.BEGIN TABIT2(19,30);GROUP;
.P1:
equal[x;y] <=\[atom[x] → [atom[y] → eq [x;y];
\\ %et%* → %ef%*];
\ atom[y] → %ef%*;
\ equal [car[x];car[y]] → equal[cdr[x];cdr[y]];
\ %et%* → %ef%*]
.END
%1
Notice that we use nested conditional expressions in %3equal%*; e%41%*
is itself a conditional. Also we have used predicates in the e%4i%* positions
at e%43%* and e%411%*; this is perfectly reasonable since %3equal%* is a predicate
and thus returns either %ef%* or %et%*.
Let's show that %3equal%* does perform correctly for a specific example. This will
also show a complicated evaluation of a conditional expression.
.BEGIN TABIT2(25,44);GROUP;SELECT 3;
equal[(A . B);(A . C)] =\[atom[(A . B)] → [atom[(A . C)] → eq [(A . B);(A . C)];
\\ %et%* → %ef%*];
\ atom[(A . C)] → %ef%*;
\ equal [car[(A . B)];car[(A . C)]] → equal[cdr[(A . B)];cdr[(A . C)]];
\ %et%* → %ef%*]
.APART
.BEGIN FILL;SELECT 1;
Now using the meaning of conditionals ({yon(P40)}),
we find that p%41%* (i.e.,#%3atom[(A#.#B)]%*#)
and p%42%* (#%3atom[(A#.#C)]%*#) when evaluated (in order) give %ef%*.
We must now evaluate p%43%*: %3equal[car[(A#.#B)];car[(A#.#C)]]%*.
This reduces to %3equal[A;A]%*, and:
.END
.GROUP;
equal[A;A] =\[atom[A] → [atom[A] → eq[A;A];
\\ %et%* → %ef%*];
\ atom[A] → %ef%*;
\ equal [car[A];car[A]] → equal[cdr[A];cdr[A]];
\ %et%* → %ef%*]
.APART
.BEGIN FILL;SELECT 1;
This conditional expression will finally evaluate to %et%*. So p%43%* in the
original call of %3equal[(A#.#B);(A#.#C)]%* is true, and we are faced with the evaluation of e%43%*
which is %3equal[cdr[(A#.#B);cdr(A#.#C)]]%*. After evaluation of the arguments and
evaluation of the conditional expression defining %3equal%* we will finally
return value %ef%*. That is, %3(A . B)%* and %3(A . C)%* are %2not%* equal.
Clearly evaluation of LISP expressions in this great detail is not a process
which we wish to do very often. It might perhaps be clear that such a process
is a likely candidate for execution by a machine.
Notice that throughout this example expressions like %3atom[(A#.#B)]%*
or %3eq[(A#.#B);(A#.#C)]%* were appearing but were never evaluated because
of the way in which we defined evaluation of conditionals.
.END
.END
Finally, to include conditional expressions in our syntax of LISP
expressions, we should add:
.BEGIN TABIT1(11);
<form>\::= [<form> → <form>; ...; <form> → <form>] (see {yon(P66)})
.END
As usual these syntax equations fail to capture all of our intended
meaning. The <form>s appearing in the p%4i%*-position are to be forms
taking values in %aTr%*, the truth domain.
.REQUIRE "PROB2.PUB" SOURCE_FILE;
.REQUIRE "PROB3.PUB" SOURCE_FILE;
.SS(Sequences: abstract data structures,,P100:)
.BEGIN TURN ON "#";
In several areas of mathematics it is convenient to deal
with sequences of information, that is, the problem domain more naturally
described as collections of numbers rather than individual numbers.
This may either simplify understanding of the problem or
simplify the formulation of the functions defined on the domain.
Thus several
common programming languages include arrays as
representations of these mathematical ideas.
First we should notice that sequences
are data structures. We will have to describe constructors, selectors, and
recognizers for them.
Also we will explore applications of sequences as data structures suitable for
representing many non-trivial problems in computer science.
After a certain familarity is gained in the application of algorithms which
manipulate sequences, we will discuss the problems of representation and
implementation of this data structure. We will first give an implementation
of sequences in terms of S-expressions. That is, we will describe an
%9r%*-mapping giving a representation of sequences and their primitive operations
in terms of LISP's S-exprs and primitive functions. Later in {yonss(P137)}
we will discuss low-level implementation of this data structure in terms of
conventional machines.
But first we will study sequences as abstract data
structures: what are their essential structural characteristics?
what properties should be present in a programming language to allow
a natural and flexible representation? This discussion will shed
light on the important problems of representation and abstraction.
A sequence is an ordered set of elements⊗↓For an alternative description
of sequences and a discussion of a different view of data structures see
{yon(P147)}.←.
For example, %2(x%41%*, x%42%*, x%43%*)%1, is standard notation for
a sequence of the three elements %2x%41%*, x%42%1, and %2x%43%1.
The length of a sequence is defined to be the number of elements in that sequence.
We will allow sequences to have
sub-sequences to an arbitrary finite depth.
That is, the elements of a sequence will either be individuals or may themselves be
sequences.
For example, a sequence of length %3n%*, each of whose elements are sequences of
length %3m%*, is a matrix.
Here are BNF equations for sequences and their elements:
.P114:
.BEGIN TABIT1(15);
.GROUP;
<seq>\::= %2(%* <seq elem>, ...,<seq elem> %2)%* ⊗↓For the meaning of these ellipses see
{yon(P130)}.←
<seq elem>\::= <indiv> | <seq>
<indiv>\:: = <literal indiv>|<numeral>| -<numeral>
<literal indiv>\:: = <indiv letter>|<literal indiv><indiv letter>|<literal indiv><digit>
<numeral>\:: = <digit>|<numeral><digit>
<indiv letter>\:: =%2 A |B |C |D |E |F |G |H |I |J |K |L |M |N |O |P |Q |R |S |T |U |V |W |X |Y |Z%*
<digit>\:: = %20 |1 |2 |3 |4 |5 |6 |7 |8 |9%*
.APART;
.END;
Notice that the structure of <indiv> is the same as that for LISP's <atom>;
the only difference is in the fonts used for letters and digits. We have
made the distinction between LISP atoms and sequence individuals intentionally.
Thus %2(A,#(B,#C),#D,#(E,#B))%* is a sequence of length four, whose second
and fourth elements are also sequences.
We will use "%2()%*" as notation for the empty sequence.
We want to write LISP-like functions operating over sequences, so we will
at least
need to give constructors, selectors and predicates for sequences⊗↓Ultimately
we will want to give a representation for sequences as S-expressions, and
give representations for the sequence operations
as operations on S-expressions.←. As in the case of Symbolic expressions,
we will include the undefined element, and
the full domain of sequences will be named
.BEGIN CENTER; TURN OFF "{","}";
%ASeq%1 = %d<seq>%*∪{%9B%1}.
.END
As on {yon(P184)}, we extend the primitive LISP operations
to include this new domain, by defining:
.BEGIN CENTER; TURN OFF "{","}";
%AS%42%1 = %aS%41%1∪%d<seq>%*,
.END
and extend each operation appropriately over %aS%42%1. Thus, for example:
.BEGIN CENTERIT;SELECT 3;GROUP
←atom[%2A%3] = %9B%3
←car[%2A%3] = %9B%3
←car[%2(A B)%3] = %9B%3
←cons[%2A%1; %2B%3] = %9B%3
←issexpr[%2(A)%3] = %ef%3
.END
We need to define some data structure operations specific to sequences.
What are the essential characteristics of a sequence? First, a sequence
either is empty or has elements. Thus we will want a predicate to test for emptyness.
Next, if the sequence is non-empty, we should be able to select elements. Finally,
given some elements, we should be able to build a new sequence from them.
The basic predicate, which tests for emptyness, is called ⊗→%3null%*↔←.
Predicates on sequences are like predicates on S-expressions,
mapping sequences to truth values in %aTr%*⊗↓The
reason for restructuring LISP predicates should now be apparent
to previous users of LISP: if we mapped the truth values to
the atoms %3T%* and %3NIL%* as is typically done, then we'd have to
map truth values of sequence-predicates to representation as sequences. Indeed
we would have to perpetuate the fraud for every new abstract data structure
domain that we wanted to introduce. Thus we did it right the first time.←.
.BEGIN CENTERIT;
.BEGIN TABIT2(10,20);
\\%et%1 if %3x%* is the empty sequence, %2()%*.
\%3null[x]%1 is \%ef%* if %3x%* is a non-empty sequence.
\\%9B%* otherwise.
.END
←%3null[%2( )%*] = %et%*
←%3null[%2(A, B)%*] = %ef%*
←%3null[%ef%*] = %9B%1
.BEGIN FILL
Thus %3null%* gives useable values only for sequences.
Since we intend to operate on domains which contain data structures other than
sequences, we will need a ⊗→recognizer↔← to be sure that %3null%* is
not applied to arguments which are %2not%* sequences.
We will name this recognizer ⊗→%3isseq%*↔←.
.END
←%3isseq[%2(A, B, C)%*] = %et%*
←%3isseq[%2A%*] = %ef%1
←%3isseq[%3A%*] = %ef%1 %3isseq[%et%*] = %ef%1
←%3isseq[%2()%*] = %et%1
←%3isseq[%9B%*] = %9B%1
.BEGIN FILL;TURN ON "#";
Thus %3isseq%* is a total predicate over all domains, whereas
%3null%* is only partial, total over %d<seq>%1.
While on the subject of predicates, there are a couple more we shall need.
The first one is a recognizer, %3isindiv%*, which will give value %et%*
if its argument is an individual, give %ef%* if its argument is a sequence,
and will give %9B%1 otherwise.
The second predicate is the extension of the equality relation to the class
of sequence individuals. We shall use the same name, %3eq%*, as
we did for the S-expression predicate.
Indeed, whenever we define a new abstract data type we will assume that
an appropriate version of %3eq%* is available for the elements of the base
domain. One of our first tasks will be to extend that equality relation to the whole
domain. We will do so for sequences later in this section.
Equality is
a basic relation in mathematics so it is not surprising to see it play an
important role here.
%3eq%* is one
of the few relations which we shall define across all domains.
Functions or predicates like %3eq%1, which are
applicable on several domains, are called %2⊗→polymorphic functions↔←%*.
.END
.GROUP
.BEGIN FILL;
Next, the selectors for a (non-empty) sequence include: %3first%*, %3second%*, ...#,etc,
where:
.END
←%3first[%2(A, B, C)%*] = %2A%*
←%3second[%2(A, B, C)%*] = %2B%*
←%3third[%2(A, B)%*] = %9B%1
.APART
.GROUP
.BEGIN FILL
It is also convenient to define an "all-but-first" 8βelector, called ⊗→%3rest%*↔←.
.END
←%3rest[%2(A, B, C)%*] = %2(B, C)%*
←%3rest[%2(B, C)%*] = %2(C)%*
←%3rest[%2(C)%*] = %2()%*
←%3rest[%2C%*] = %9B%1
←%3rest[%2( )%*] = %9B%1
.APART
.GROUP
.BEGIN FILL
In conjunction with %3rest%*, we shall utilize a constructor, ⊗→%3concat%*↔←,
which is to add a single element to the front of a sequence.
.END
←%3concat[%2A%*;%2(B,C)%*] = %2(A, B, C)%*
←%3concat[%2A%*;%2()%*] = %2(A)%*
←%3concat[%2(A)%*;%2(B,C)%*] = %2((A), B, C)%*
←%3concat[%2(B,C)%*;%2A%*] = %9B%1
←%3concat[%2A%*; %2B%*] = %9B%1
.APART
.END
.GROUP
.SELECT 1;
The final constructor is called ⊗→%3seq%1↔←; it takes an arbitrary number of sequence
elements as arguments and returns a sequence consisting of those elements (in the
obvious order).
Let %9α%41%1,#...,#%9α%4n%1 be elements of <seq elem>, then:
.BEGIN CENTERIT;SELECT 3;
←seq[%9α%41%3; %9α%42%3; ...; %9α%4n%3] = %2(%9α%41%2, ..., %9α%4n%2)%1
.END
.APART
One question may have come to mind: how do we know when we have a sufficient set
of functions for the manipulation of an abstract data structure?
How do we know we haven't left some crucial functions out? If we have enough,
how do we know that we haven't included too many?
Actually, this second case isn't disastrous, but when implementing the functions
it would be nice to minimize the number of primitives we have to program.
These problems are worthy of study and are the concern of anyone
interested in the design of programming languages. We will say a bit more
about solutions to these questions beginning on {yon(P116)}.
Notice that we have been describing the sequence functions
without regard to any undeg representation.
We have said nothing about these sequence operations except that they
construct, test, or select.
What we are doing is considering sequences as abstract data structures,
suitable for manipulation by LISP-like programs. How sequences are represented
on a machine or indeed as S-expressions, is irrelevant. Sequences have
certain inherent structural properties and it is those properties which
we must understand %2before%* we begin thinking about representation on a machine.
In the next section we will show how to represent sequences
as certain S-expressions and sequence operations as
LISP operations on that representation.
.END
First let's develop some expertise in manipulating sequences.
The first example will be our promised extension of the equality relation
to sequences. We perpetuate the name %3equal%* from S-exprs, and the basic
structure of the definition will parallel that of its namesake; but
the components of the definition will involve sequence operations rather
than S-expr operations. It might be of value to compare the two predicates.
The S-expr version is to be found on {yon(P1)}.
.BEGIN TABIT2(19,30);GROUP;SELECT 3;
equal[x;y] <=\[null[x] → [null[y] → %et%*;
\\ %et%* → %ef%*];
\ null[y] → %ef%*;
\ equal%9'%* [first[x];first[y]] → equal[rest[x];rest[y]];
\ %et%* → %ef%*]
.END
.BEGIN TABIT2(19,30);GROUP;SELECT 3;
equal%9'%*[x;y] <=\[isindiv[x] → [isindiv[y] → eq[x;y];
\\ %et%* → %ef%*];
\ isindiv[y] → %ef%*;
\ %et%* → equal[x;y]]
.END
Next, we will write a predicate %3member%* of two arguments %3x%* and %3y%*.
%3x%* is to be an individual; %3y%* is to be a sequence;
%3member%* is to return
%et%* just in the case that %3x%* is an element of the sequence %3y%*.
What does this specification tell us?
The predicate is partial. The recursion should be on the structure of %3y%*;
and termination (with value %ef%*) should occur if %3y%* is the empty sequence.
If %3y%* is not
empty then it has a first element (call it %3z%*); compare %3z%* with %3x%*.
If these elements are identical then %3member%* should return %et%*; otherwise
see if %3x%* occurs in the remainder of the sequence %3y%*.
.GROUP
Notes:
%21.%* We cannot use %3eq%* to check equality since, though %3x%* is an individual,
there is no
reason to believe that the elements of %3y%* need be.
%22.%* Recall that we can get the first element of a sequence with
%3first%*,
and the rest of a list with %3rest%*.
.APART
So here's %3member%*:
.BEGIN TABIT2(16,32);SELECT 3;GROUP;
\member[x;y] <=\[null[y] → %ef%*;
\\ eq[first[y];x] → %et%*;
\\ %et%* → member[x;rest[y]]]
.END
Finally here is an arithmetic example to calculate the length of a sequence;
that is, the number of elements in the sequence.
.P118:
.BEGIN CENTERIT;SELECT 3;
←length[n] <= [null[n] → 0; %et%* → plus[1;length[rest[n]]]]
.END
.SS(Lists: representations of sequences,lists)
It is all well and good to be able to write LISP-like functions describing
operations on sequences. The algorithms are clean and understandable.
However, if we wish to run these programs in a LISP environment, then
we had better be prepared to represent both the data structures and the
algorithms in terms understandable to LISP⊗↓Indeed if we wish LISP to
run on a conventional machine we had better be prepared to represent
LISP's data structures and algorithms in a manner understandable
to conventional hardware. This task is the subject of later chapters
in the book.←. This is the problem of representation. Granted, we
could have overcome the problem by explicitly representing sequences directly
as LISP S-expressions and could have written functions in LISP which
used %3car-cdr%*-chains to directly manipulate the representations.
This misuse of representation is a common fault in
LISP programming and its practice is to be discouraged.
First, the resulting programs are much more difficult to read and debug and understand.
More important, the programs are explicitly tied to a specific representation
of the abstract data structure; if at some later date it is desired to
change the representation, then many programs will have to be rewritten.
In {yonss(P41)} we develop a complex algorithm for
differentiation on a class of polynomials, moving from an unclear
and highly representation-dependent formulation, to a clear, concise,
representation-independent algorithm.
Obviously we will always have to supply a representational bridge between
the abstract data structures and algorithms, and their concrete counterparts.
One aspect of this study of data structures is to understand what is
required to build this bridge and how best to represent these requirements
in a programming language.
The first decision to be made is how to represent the abstract data structure;
how should we represent sequences as S-expressions? Indeed how should we
choose representations in general? Usually there is not just one "best"
representation. Some obvious considerations involve the difficulty of
implementing the primitive operations (constructors, selectors,
recognizers, and predicates) on the abstract data structure. Also we must
keep in mind the kinds of algorithms which we wish to write;
computation takes time, and since this is computer science we should
give consideration to efficiency.
.GROUP
A reasonable choice for a representation of sequences as S-expressions
is the following:
.BEGIN CENTERIT;
←%9r%f(%1<indiv>%f)%1 = %9.%4<atom>%1
←and for %9α%41%1,#...,#%9α%4n%1 in %d<seq elem>%*:
.END
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "?" FOR "\"
.TURN ON "∞" FOR "%";
.TABS 10,35;
? ∞9r∞f(∞2 (∞9α∞41∞1, ..., ∞9α∞4n∞2) ∞f)∞1 = ∞G? /\
? ? / \
? ? ∞9r∞f(∞2 ∞9α∞41∞2 ∞f)∞g \
? ? \
? ? \
? ? /\
? ? / \
? ? ∞9r∞f(∞2 ∞9α∞42∞2 ∞f)∞g
? ? ...
? ? \
? ? /\
? ? / \
? ? / \
? ? / ∞3NIL∞*
? ? ∞9r∞f(∞2 ∞9α∞4n∞2 ∞f)
.END
.APART
.BEGIN TURN ON "#";
That is, the right-hand branch in this LISP-tree representation of a sequence
will always point to the rest of the sequence or will be the atom %3NIL%*.
Notice that the description of the %9r%*-mapping is recursive. Thus for example:
.GROUP;
.BEGIN SELECT g;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "?" FOR "\"
.TURN ON "∞" FOR "%";
.TABS 15,37;
? ∞9r∞f(∞2 ((A,B,C),(D)) ∞f)∞1 = ∞G? /\
? ? / \
? ? / \
? ? ∞9r∞f(∞2 (A,B,C) ∞f)∞g \
? ? \
? ? \
? ? /\
? ? / \
? ? / ∞3NIL∞*
? ? ∞9r∞f(∞2 (D) ∞f)∞g
.END
which will finally expand to %3((A .(B . (C . NIL))) .((D . NIL) . NIL))%*
since %9r%f(%2#(A,B,C)#%f)%1 is %3(A#.(B#.(C#.NIL)))%1 and
%9r%f(%2#(D)#%f)%1 is %3(D#.#NIL)%*.
.APART
You should become fluent in translating between
S-expr notation and sequence notation.
For convenience sake we will carry over the sequence notation --#%2(A,#B,#C)%*#--
to that for the representation in LISP --#%3(A,#B,#C)%*#--⊗↓
%1Be aware that
%3A%* is an atom and %2A%* is a sequence element; they are not the same
data structure.←
thinking of %3(A,#B,#C)%* as an abbreviation for %3(A#.(B#.(C#.#NIL)))%*.
.END
Next, what about a representation for the empty sequence? Looking at the
representation of a non-empty sequence it appears natural to take ⊗→%3NIL%*↔← as
%9r%f(%2()%f)%1 since after you have removed all the elements from the sequence
%3NIL%* is all that is left in the representation.
To be
consistent then:
.BEGIN CENTERIT;
←%9r%f(%2#()#%f)%1 = %3NIL%*
.END
This gives us a complete specification of the %9r%*-mapping for the domain;
we have represented the abstract domain of sequences in a subset of the
domain of Symbolic Expressions.
The S-expr representation of a sequence is called a %2⊗→list↔←%*; and we
will refer to the abbreviation,
.BEGIN CENTERIT;GROUP;
←%3(%9α%41%3,#...,#%9α%4n%3)%1 for
←%3(%9α%41%3#.(%9α%42%3#.##...#(%9α%4n%3#.#NIL)#...)%1 as %2⊗→list-notation↔←%*.
.END
Thus sequences are the abstract data structure; lists are their representation.
Since the atom %3NIL%* takes on special significance in list-notation
it is endowed with the special name %2⊗→list terminator↔←%*.
.GROUP
And
a notational point: in graphical interpretation of list-notation
it is often convenient to write:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
.turn on "?" FOR "\"
.TABS 20,45;
?⊂αααπααααα⊃ ? ⊂αααπαα⊃
?~ ~ NIL ~ ∞1as∞* ? ~ ~≤'~
?%ααα∀ααααα$ ? %ααα∀αα$
.END
.APART
%1
.GROUP
Thus, for example %3(A, (B, C), D)%* is:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#";
.TURN ON "∞" FOR "%";
⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ A ~ #αβα→~ # ~ #αβα→~ D ~≤'~
%ααα∀ααα$ %αβα∀ααα$ %ααα∀αα$
~
~ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ B ~ #αβα→~ C ~≤'~
%ααα∀ααα$ %ααα∀αα$
.END
.APART
or, in "dotted-pair" notation: %3 (A .((B .(C . NIL)).(D . NIL)))%1.
Finally, in list-notation the commas can be replaced by spaces⊗↓This convention
is one of the few instances of a "good" bug. The early LISP papers required
full use of commas, but due to a programming error in the LISP output routine,
lists were printed without commas. It looked so much better that the bug
became institutionalized.←
.BEGIN CENTERIT;
%1←e.g. %3(A, (B, C), D) = (A (B C) D).
%1but beware: the "dots" in dot-notation are %2never%* optional!
←%1that is %3(A. (B. C)) %a≠%* (A (B C)).
.END
Let's take stock of our position: We have an intuitive understanding
of what we mean by "sequence". We have described selectors, constructors,
and recognizers, albeit at an abstract level, for manipulating
sequences. We have represented our notion of sequences
as a subset of the S-expressions called lists.
Clearly the final step is to represent our
sequence-manipulators as certain LISP functions.
Let %3first%4r%1 be a LISP function which will represent the sequence
operation %3first%*⊗↓Indeed, once the %9r%*-mapping is defined
on the %2domain%* it is induced on the %2operations%*.←.
Then for example we might expect:
.BEGIN CENTERIT SELECT 3; TURN ON "#";
←%3first%4r%*[(A, B, C)] = %9r%f(%3 first[%2(A, B, C)%3] %f)%3 = A%1.
.BEGIN FILL;SELECT 1;
The problem is that this line is not quite right.
LISP functions expect their inputs to be S-expressions
but %3(A,#B,#C)%* is %2not%* an S-expression.
To be correct we %2should%* have written:
.END
←%3first%4r%*[(A .(B . (C . NIL)))] = A%*
.END
It might be argued that %3(A, B, C)%* is just a convenient abbreviation
for the ugly %3(A#.#(B#.#(C#.#NIL)))%*, but even so, if we wish the
machine to understand and use the abbreviation we must examine the
implications of the notation.
Clearly it is more reasonable to read and write in list notation.
As long as we perform only list-operations on lists there is
no reason to look at the underlying dotted-pair representation⊗↓Indeed,
a strong case can be made for %2never%* allowing any operations on lists
%2except%1 list operations! See the discussion of type-faults on {yon(P131)}
and {yon(P132)}.←.
However, it must be remembered that list operations are carried out on the
machine using the dotted-pair representation.
%2We%* might carry out the "list-to-dotted-pair"
transformations implicitly, but a machine which evaluates LISP
expressions will have to have an explict transformation mechanism.
So a convenient and even necessary part of our representation of
sequences is the specification of transformations between the
abstract data structure notation and the notation of the underlying
representation.
.BEGIN TURN ON "←";
Next we can give representations for the sequence operations.
To be precise we should continue our convention of writing
the subscript %4r%* on the LISP representation of a sequence operation; thus
%3seq%* is represented by %3seq%4r%1. In most circumstances no confusion is
likely, so we will usually omit the subscript.
In our representation, the construction
of a sequence from an arbitrary number of elements
will be represented by a LISP function
%3seq%4r%1. We will use %3list%* interchangeably with %3seq%4r%1.
.BEGIN CENTERIT;
←%9r%f(%3 seq %f)%* = list
.END
.group
⊗→%3list↔←%3[%9α%41%3; %9α%42%3; ... ;%9α%4n%3]%1 generates
a list consisting of the
%9α%4i%1 arguments. That is,
%3list%1
is the appropriately nested composition of %3cons%*es:
←%3cons[%9α%41%3;cons[%9α%42%3; ... cons[%9α%4n%3;NIL]] ...] %1.
.BEGIN GROUP;NOFILL;
Examples:
.select 3;
←list[A;B] = (A B)
←list[A;B;C] = (A B C)
←list[cons[A;B];car[(A . B)]] = ((A . B) A)
←list[A;list[B;C]] = list[A;(B C)] = (A (B C))
←list[] = ()
←list[NIL] = (NIL)
.END
.apart
Notice that %3list%* is %2not%* strictly a function in the LISP sense;
it %2does%* evaluate its arguments,
but it can take an arbitrary number of them. See {yon(P111)}.
For the moment, %3list%* is simply a notational abbreviation for nested
applications of %3cons%*.
The representation of the selector functions should be apparent from the
graphical representation.
We leave it as an exercise for the reader to specify representations
for these functions; however, here are a few of the other representations:
.GROUP
.BEGIN CENTERIT;
←%9r%f(%3 isindiv %f)%* = atom
←%9r%f(%3 isseq %f)%* = isstrictlist%1 where:
.END
.BEGIN SELECT 3;TABIT2(12,28);
.P185:
\isstrictlist[x] <=\[atom[x] → [eq[x; NIL] → %et%3; %et%3 → %ef%3]
\\ islistelement[car[x]] → isstrictlist[cdr[x]];
\\ %et%3 → %ef%3]
\islistelement[x] <= \[atom[x] → %et%3; %et%3 → isstrictlist[x]]
.END
.APART
Some practicioners of LISP use this strict definition of lists,
so that elements of a list are either atomic or are lists themselves.
In practice it is often convenient to allow
elements of a list to be arbitrary S-expressions. This more liberal
interpretation of lists is expressed by the following recognizer:
.BEGIN CENTERIT;
←%3islist[x] <= [atom[x] →[eq[x;NIL]→%et%*;%et%*→%ef%*]; %et%*→islist[cdr[x]] ]
.END
Thus %3(A,#(A#.#B),#C)%* is a list of three elements.
But beware: %2(A,#(A#.#B),#C)%* is %2not%* a sequence.
.END
.SELECT 1;
.P116:
To summarize the accomplishments of this section, we have in
effect added a %2new%* data structure to the repertoire of LISP.
The addition process included:
.BEGIN INDENT 0,10,10;
%21. The abstract operations.%* We give constructors, selectors, and predicates for
the recognition of instances of the data structure.
%22. The underlying representation.%* We must show how the new data
structure can be represented in terms of existing data structures.
%23. Abstract operations as concrete operations.%* We must write LISP
functions which faithfully mirror the intended meaning of the abstract operations
when interpreted in the underlying representation.
%24. The input/output transformations.%* We should give conventions
for transforming to and from the internal representation.
.END
There is another view of this problem of representability of data structures
due to J. Morris. Here we use the notion of %2transfer functions%1 whose purpose
is to supply mappings between the abstract structure and its representation.
Once the transfer functions are given it is usually easy to supply appropriate
implementations of the abstract primitives. We need two transfer functions;
a write-function, %aW%*, to map the representations into the abstract objects;
and a read-function, %aR%*, to map the abstract objects to their representations.
For the problem at hand, representing sequences, we want %aR%* to map from
elements of %d<seq elem>%* to %d<sexpr>%* (see {yon(P114)} and {yon(P115)}); and
we want %aW%* to map from %d<sexpr>%* to %d<seq elem>%*.
Before we give such %aR%* and %aW%*, let's see what they will do for us.
We could define %3first%4r%1 such that:
.BEGIN CENTERIT;
←%3first%4r%*[x] = %aW%*(car[%aR%*(x)])
.END
Here we have used "(...)" for function application rather than "[...]" which
is reserved for LISP evaluation. What the equation says is that given a
sequence %3x%*, we can map it to the S-expression representation using %aR%*;
the result of this map is an S-expr and therefore suitable fare for %3car%*'s
delicate palate; the result of the %3car%* operation is then mapped back into
the set of sequence elements by %aW%*.
The other operations for manipulating sequences can be described similarly.
With this introduction, here are appropriate transfer functions:
.BEGIN TABIT1(15);SELECT 3;
.GROUP
%aW%*(e) <=\[isnil[e] → mknull[];
\ atom[e] → mkindiv[e];
\ %et%* → concat[%aW%*(car[e]);%aW%*(cdr[e])]
\ ]
.END
.BEGIN TABIT1(15);SELECT 3;
.GROUP
%aR%*(l) <=\[null[l] → NIL;
\ isindiv[l] → atomize[l];
\ %et%* → cons[%aR%*(first[l]);%aR%*(rest[l])]
\ ]
.END
We have seen all of the functions and predicates involved in %aR%* and %aW%*
except the three %3atomize%*, %3mknull%1 and %3mkindiv%*. If you think about the implementation
of these two functions you would see that they are the identity functions
for all practical purposes. However they %2are%* only because of the
representations that we happened to pick; they need not be so simple.
A more careful inspection would show that %3mkindiv%* expects as input an atomic
S-expression and outputs a sequence individual; %3atomize%* acts
conversely. If the representations of the atomic S-expressions were different
from the representations of sequence individuals, then we would have some
work to do.
The scheme of transfer functions is a more mathematical approach to that
outlined above in %21.%*-%24.%*. We find %21.%*-%24.%* preferable since
it gives a description more in line with what we should expect to implement
in a programming language.
To finally review what has transpired since it is a model of what is to come:
we developed a new abstract data structure called sequences; discussed
notational conventions for writing sequences; described
operations and pertinent control devices for writing algorithms;
and finally showed that
it was possible to represent sequences in the previously developed
domain of S-exprs. Thus if we had a machine which could execute
S-expr algorithms we could encapsulate that machine within the %9r%*-mapping
such that we could write in sequence-notation and have it translated internally to
S-expr form; we could write sequence-algorithms and have them execute correctly
using the %9r%*-maps of the sequence primitives; and finally it would
produce sequence-output rather than the internal S-expr form. To all intents
and purposes our augmented LISP machine understands sequences.
Indeed, this is the way most LISP implementations are organized; input
may either be in S-expr form or list-notation; internally all data structures
are stored as S-exprs; all algorithms operate on the S-expr form; and finally,
S-exprs which can be interpreted as lists are output in list-notation.
We will approach the other abstract data structure problems in a similar manner,
first developing the data structures independent of their representation, and
later showing how to represent this new domain in terms of some previously
understood domain. We will see in {yonss(P105)} that much of the
mapping from input through output can be specified in a natural style
and LISP can automatically generate the necessary input and output
programs.
.CENT(Problems)
I Discuss %3cons[%9α%41%3;cons[%9α%42%3;%9α%43%3]]%1
as opposed to %3cons[%9α%41%3;cons[%9α%42%3; cons[%9α%43%3; NIL]]]%1
as a representation for %3(%9α%41%3, %9α%42%3, %9α%43%3)%1.
.REQUIRE "PROB4.PUB" SOURCE_FILE;
.SS(A respite)
.SELECT 1;
.BEGIN TURN ON "←";
This section contains some hints and notes on the introductory material of
this chapter. First a reiteration of a previous admonition: though most of
this material seems quite straightforward, the next chapter will begin
to show you that things are not all that trivial. LISP is quite powerful.
The preceding material %2is%* basic and the sooner it becomes second nature
to you the better.
A second admonition: besides learning about the basic constructs of the language,
the previous material should begin to convince you of the necessity for
precise specification of programming languages.
In particular we have seen that the process of evaluation of expressions
must be spelled out quite carefully. Different evaluation schemes lead to
quite different programs. Since evaluation %2is%* the business of programming
languages we'd better do all we can to make a precise specification.
And a final warning:
a major point of this whole book is to stress the proper respect for abstraction
as a tool for controlling complexity in programming, and as a means of
writing implementation (representation) independent programs. As we begin
writing more complex algorithms, the power of abstraction will become more
apparent, but the lessons we learned in representing
sequences contain the essential ideas of abstraction and representation.
Do not forget them.
We have now seen two examples of abstract data structures. First, the study
of Symbolic Expressions was begun without any regard for their implementation.
They were deemed to be objects of sufficient interest in their own right⊗↓But
if there is some nagging doubt about the problems of implementation,
relax; we'll see plenty of this kind of thing later.←. The basic components
of our study were the data structures, themselves; the operations on the
data structures (%3car, cdr, cons, eq%* and %3atom%*),
and finally the %2⊗→control structures↔←%* needed by the algorithms
which operated on the data structures. The control structures for
our LISP algorithms are the conditional expression and recursion. They
are called control structures since they are used to direct the flow
of the algorithm as it executes.
Indeed these three components, data, operations, and control, are the
main ingredients of any programming language.
Most languages have a superficially richer class of control devices;
"while"-statements and "DO"-loops are examples. Most control structures
are explicit language constructs, whereas recursion is typically implicit⊗↓However
some languages do require some kind of declaration to the effect that a
procedure is recursive.←.
As we introduce each new abstract data structure we add new operations tailored
to its needs. In adding sequences we introduced %3first, rest, null,#...%*.
We should also consider the question of introducing new control structures.
Again, with sequences we stayed with recursion, though perhaps
a simpler control structure which went down the sequence, selecting elements
might be more in keeping with the data structure.
There is a
natural relationship between data structure and control structure;
sometimes we can exploit it to good measure. Looking ahead, the
iterator %3lit%* on {yon(P136)} is an example of a control regime
applicable to sequences.
When we consider abstract data structures in future chapters we will again
see the three components: data, operations, and control.
The new feature which we considered in discussing sequences was the problem
of representation. We showed how to represent sequences in terms of
S-expressions. We will continue this pyramiding of data structures in the future;
we will consider our work done as soon as we have a representation
of our new data structure in terms of an existing one. Finally
the suspense will become too great and we will exhibit a representation
of the underlying layer of S-expressions. Even later we will discuss
efficient representation of data structures, independent of their possible
S-expression representation. Indeed, there are lots of data
structures which are not best represented as S-expressions.
A further consideration appears because of the representational issue;
even though we have represented a particular data structure as a
complex S-expression we have %2no%* business operating on that representation
with S-expression functions. Please don't use %3car%* and %3cdr%* on
lists even though you know what the representation is.
You might have noticed
that in our representation of lists we can find the n%8th%*
element in a list by using %3cad%8n-1%*r%1. And you know
%3cadr%* is the second element, and that %3cdr%* is the %3rest%* of the list.
But please keep it a secret. These representation-dependent coding
tricks⊗↓called "puns" by C. Strachey← are dangerous. They are
really type faults as discussed on {yon(P131)} and {yon(P132)}.
While we are discussing some of the more practical implications of our work
we should not neglect %9B%1. How should the be understood.
As things currently stand, the apprearane of %9B%1 in any application
of strict functions will immediately cause the termination of the computation.
No information other than the fact that %9B%1 did appear results from such
an occurrence. Indeed if we thought of the evaluation of %9B%1 as resulting in
a divergent computation, then no information at all would be forthcoming.
In reality a LISP implementation will handle computations which we include
within the province of %9B%1 as error conditions. The computation might
be terminated and an error printed; in an interactive implementation
the user might be given an opportunity to correct ther error and the computation
continued; and alas, some implementations just continue computation with
some arbitrary piece of information produced by an excursion into
the subsconscious of LISP. We will have much more to say about
the implementation of %9B%1 in {yonss(P168)}.
Well, what does all this have to do with a course in data structures?
We will show quite soon that we can exploit abstraction as a means for
giving a clear specification of evaluation of LISP expressions, and
the representational techniques we will use will involve applications
of abstract data structures. A more tangible benefit perhaps should be
an increased awareness of the structure and behavior of programming languages,
and hopefully the beginnings of a better style of programming.
.P147:
Another part of our investigation should be to answer the question
"What is a data structure?".
As we mentioned at the beginning of {yonss(P100)} there is a different
characterization of sequences which will give a different interpretation
of data structures. The standard mathematical definition of
a sequence is as a function from the integers to a particular domain.
Thus a finite sequence %ds%* might be given as:
.BEGIN TURN OFF "{","}";CENTERIT;
←%ds%1 = %*{<1, s%41%*>, <2, s%42%*>, ...<n, s%4n%*>}
.END
To select components of %ds%*, we use ordinary function application:
%ds(i)%1#=#%*s%4i%1. Indeed, if you have programmed in a language which
has array constructs, you will recognize "application" as
the style of notation used.
However this is quite different from what we did in the section on sequences.
For example, if %2(A,#B,#C)%* is a sequence, %ds%*, then in the new interpretation
we should write:
.BEGIN TURN OFF "{","}";CENTERIT;
←%ds%1 = %*{<1, %2A%*>, <2, %2B%*>,<3, %2C%*>}
.END
Thus %ds(2)%1 is %2B%*, etc. What has happened is that what was previously
considered to be a data structure has become a function, and the selector
functions on the data structure have now become static indices on the
function.
Or to make things more transparent:
.BEGIN TURN OFF "{","}";CENTERIT;
←%ds%1 = %*{<%3first%*, %2A%*>, <%3second%*, %2B%*>,<%3third%*, %2C%*>}
.END
Then we would write %ds(%3first%*)%1 rather than %3first%d(s)%1⊗↓G. Steele
reports that PPL
(Polymorphic Programming Language) at Harvard lets you do this: %3car[s]%1 and
%3s[car]%1 both work.←.
This idea can easily be applied to S-exprs and their functions.
In graphical terms
we are representing the structures such that the arcs of the graph
are labeled with the selector indices. With L-trees
the labeling was implicit: left-branch was %3car%*; right-branch was %3cdr%*.
With explicit labels on the branches, the trees become unordered.
Several languages implement such unordered trees; they are called
%3structure%*s in Algol#68 and EL1, and called %3record%*s in Pascal.
Several formalisms exploit this view of data structures; in
particular the Vienna Definition Language, which is a direct descendant of LISP,
represents its data in such a manner.
What then is a data structure? It depends on how you look at it. For our
immediate purposes we will try to remain intuitive and informal.
We will try to characterize an abstract data structure as a domain and
a collection of associated operations and control structures. The operations and
control mechanisms should allow us to describe algorithms in a natural manner
but should, if at all possible, remain representation independent.
Now for some more mundane tips and tricks on LISP programming.
When evaluating or writing functions or (predicates) %2always%* keep in mind
any restrictions of the function: is it partial? is it total? defined only
for lists? are there restrictions on arguments? When taking %3car%* or %3cdr%* is the
argument non-atomic?
A few tricks were embedded in the problem sets.
Recall problem %28%* on {yon(P11)}. The composition %3atom[cons[ ...]]%*
will always evaluate to %ef%* ⊗↓%1If it has a value at all!
If the computation of the arguments to the %3cons%* does not
terminate or gives %9B%1 then we obviously won't
get %ef%*.←
since the result of %3cons%*-ing is always
non-atomic.
In %210.%*, we used atoms with the same
letter strings as predicate names, %3ATOM%* and %3EQ%*. %3ATOM%* and %3EQ%*
are perfectly good atoms, and are %2not%* the LISP predicates.
%214.%* shows that predicates are perfectly good expressions to evaluate; e%41%*
is a predicate. Similarly, %216.%* shows that some conditional expressions may
appear within a functional composition.
.BEGIN TURN ON "#";
Notice that %3twist%* in II is total whereas %3findem%* is partial.
%3findem%* is partial since %3y%* must be atomic. Both functions build
new trees: %3twistem%* reverses left- and right-branches recursively;
%3findem%* builds a tree with the same branching structure as %3x%*,
but the terminal nodes contain %3T%* at the points where the atom %3y%*
appears in the original tree, and %3NIL%* otherwise.
.P24:
.END
Now for a representational trick:
on {yon(P12)} you should have discovered that the value of:
←%3cons[%9α%41%3; (%9α%42%3, ...%9α%4n%3)]%1 is: %3(%9α%41%3, %9α%42%3, ... %9α%4n%3).
%1Notice that %3list[%9α%41%3;(%9α%42%3, ... %9α%4n%3)]%1 is
%3(%9α%41%3 (%9α%42%3 ... %9α%4n%3))%1.
So %3cons%* will add a new element to the front of an existing list. %3list%*
will create a new list whose elements will be the values of the arguments
to %3list%*.
Be clear on the difference between the representation of the empty list, %3NIL%*, and
the list consisting of %3NIL%*, %3(NIL)%*; %3(NIL)%* is
an abbreviation for %3(NIL . NIL)%*, which certainly is not %3NIL%*.
List-notation is an abbreviation and can always be translated back
into a S-expr.
And an admonition:
though our representation of sequences is such that %3first%*, %3rest%*
and %3concat%*
are identical to %3car%*, %3cdr%*, and %3cons%* respectively, we should
use the names %3first%*, %3rest%*, and %3concat%*
to make it clear that we are operating on lists.
.END